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Abstract 

A basic result of large deviations theory is Sanov's theorem, which 
states that the sequence of empirical measures of independent and iden- 
tically distributed samples satisfies the large deviation principle with 
rate function given by relative entropy with respect to the common 
distribution. Large deviation principles for the empirical measures are 
also known to hold for broad classes of weakly interacting systems. 
When the interaction through the empirical measure corresponds to 
an absolutely continuous change of measure, the rate function can be 
expressed as relative entropy of a distribution with respect to the law 
of the McKean-Vlasov limit with measure- variable frozen at that dis- 
tribution. We discuss situations, beyond that of tilted distributions, in 
which a large deviation principle holds with rate function in relative 
entropy form. 
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1 Introduction 

Weakly interacting systems are families of particle systems whose compo- 
nents, for each fixed number N of particles, are statistically indistinguishable 
and interact only through the empirical measure of the TV-particle system. 
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The study of weakly interacting systems originates in statistical mechanics 
and kinetic theory; in this context, they are often referred to as mean field 
systems. 

The joint law of the random variables describing the states of the N- 
particle system of a weakly interacting systems is invariant under permuta- 
tions of components, hence determined by the distribution of the associated 
empirical measure. For large classes of weakly interacting systems, the law 
of large numbers is known to hold, that is, the sequence of iV-particle empir- 
ical measures converges to a deterministic probability measure as N tends 
to infinity. The limit measure can often be characterized in terms of a limit 
equation, which, by extrapolation from the im portant case of Markovian sys- 
tems, is called McKean-Vlasov equation (cf. iMcKeanl . I1966T ) . As with the 
classical law of large numbers, different kinds of deviations of the prelimit 
quantities (the TV-particle empirical measures) from the limit quantity (the 
McKean-Vlasov distribution) can be studied. Here we are interested in large 
deviations. 

Large deviations for the empirical measures of weakly interacting sys- 
tems, especially Markovian systems, have been the object of a number of 
works. The large deviation principle is usually obtained by transferring 
Sanov's theorem, which gives the large deviation principle for the empir- 
ical measures of independent and identically distributed samples, through 
an absolutely continuous change of measure. This approach works when 
the effect of the interaction through the empirical measure corresponds to 
a change of measure which is absolutely continuous with respect to some 
fixed reference distribution of product form. A way of transferring Sanov's 
theorem through an absolutely continuous change of measure is provided 
by Varadhan's lemma. In the case of Markov ian dynamics, t his yields 
the large deviation principl e on path space; see [ Leonard (11995a! ) for non- 
degenerate jump diffusions, iDai Pra and den Hollander! (jl996l ) for a model 
of Brownian particles in a po tential field and random environment, and 



Del Moral and Guionnetl (119981 ) for a class of discrete-time Markov processes. 



An extension of Varadhan's lem ma tailored to the change of measure needed 
for empirical measures is given in iDel Moral and Zajid (120031 ) and applied to 
a variety of non-degenerate weakly interacting systems. The large deviation 
rate function in all those cases can be written in relative entropy form, that 
is, expressed as relative entropy of a distribution with respect to the law of 
the McKean-Vlasov limit with measure-variable frozen at that distribution; 
cf. Remark 13.21 below. 

In the case of Markovian dynamics, the large deviation principle on path 
space can be taken as the first step in deriving the large deviation prin- 
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cip le for the empirical process es; cf. ILeonardl (1995aj) or iFend fjl994al lbL 



In iDawson and Gartner! (jl987l ). the large deviation principle for the em- 
pirical processes of weakly interacting Ito diffusions with non-degenerate 
and measure-independent diffusion matrix is established in Freidlin-Wentzell 
form starting from a process level representation of the rate function for 
non-interacting Ito diffusions. The large deviation principle for interact- 
ing diffusions is then derived by time discretization, local freezing of the 
measure variable and an absolutely continuous change of measure with re- 
spect to the result i ng pr oduct distributions. A similar strategy is applied in 



Djehiche and Kail (jl995l ) to a class of pure jump proc esses. 



A different approach is taken in the early work of iTanakal (119841 ) . where 
the contraction principle is employed to derive the large deviation principle 
on path space for the special case of Ito diffusions with identity diffusion ma- 
trix. The contraction mapping in this case is actually a bijection. Using the 
invariance of relative entropy under bi-measura ble bisecti o ns, the rate func- 
tion is shown to be of relative entropy form. In ILeonardl (jl995bl ). the large 
deviation upper bound, not the full principle, is derived by variational meth- 
ods using Laplace functionals for certain pure jump Markov processes that do 



not al low for an absolutely continuous change of measure. In lBudhiraia et al. 



(2012), the path space Laplace principle for weakly interacting ltd processes 
with measure-dependent and possibly degenerate diffusion matrix is estab- 
lished based on a variational representation of Laplace functionals, weak 
convergence methods and ideas from stochastic optimal control. The rate 
function is given in variational form. 

The aim of this paper is to show that the large deviation principle holds 
with rate function in relative entropy form also for weakly interacting sys- 
tems that do not allow for an absolutely continuous change of measure with 
respect to product distributions. The large deviation principle in that form 
is a natural generalization of Sanov's theorem. Two classes of systems will 
be discussed: noise-based systems to which the contraction principle is ap- 
plicable, and systems described by general weakly interacting Ito processes. 

The random variables describing the states of the particles will be as- 
sumed to take values in a Polish space. The space of probability measures 
over a Polish space will be equipped, for simplicity, with the standard topol- 
ogy of weak convergence. Continuity of a functional with respect to the 
topology of weak convergence might be a rather restrictive condition. This 
restriction can be alleviated by considering the space of probability mea- 
sures that satisfy an integrability condition (for instance, finite moments of 
a certain order), equipped with the topology of weak(-star) convergence with 
respect to the corresponding class of continuous functions (for instance, Sec- 
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tion 2b) in iLeonardl . Il995al ) . The results presented below can be adapted to 
this more general situation. 

The rest of this paper is organized as follows. In Section [2j we collect 
basic definitions and results of the theory of large deviations in the con- 
text of Polish spa ces that will be used i n the seque l; standard refere n ces fo r 
our purposes are iDembo and Zeitounil (|1998f ) and iDupuis and Ellisl (jl997l ) . 



In Section [31 we introduce a toy model of discrete-time weakly interacting 
systems to illustrate the use of Varadhan's lemma, which in turn yields, 
at least formally, a representation of the rate function in relative entropy 
form. In Section HJ a class of weakly interacting systems is presented to 
which the contraction principle is applicable but not necessarily the usual 
change-of-measure technique. The large deviation rate function is shown 
to be of the desired form thanks to a contraction property of relative en- 
tropy. In Section |5j we discuss the case of weakly interacting ltd diffusions 
wi th measure-depe n dent and possibly degenerate diffusion matrix studied 



m iBudhiraia et al.l (120121 ). The variational form of the Laplace principle 
rate function established there is shown to be expressible in relative entropy 
form. As a by-product, one obtains a variational representation of relative 
entropy with respect to Wiener measure. The Appendix contains two results 
regarding relative entropy: the contraction property mentioned above, which 
extends the well-known invariance property, and a direct proof of the varia- 
tional representation of relative entropy with respect to Wiener measure. 



2 Basic definitions and results 

Let S be a Polish space (i.e., a separable topological space metrizable with a 
complete metric). Denote by B(S) the cr-algebra of Borel subsets of S and by 
V(S) the space of probability measures on B(S) equipped with the topology 
of weak convergence. For £ V(S), let R(v\\[i) denote the relative entropy 
of v with respect to /i, that is, 

{J s log {^{x)^ v{dx) if v absolutely continuous w.r.t. /i, 
oo else. 

Relative entropy is well defined as a [0, oo]-valued function, it is lower semi- 
continuous as a function of both variables, and R{u\\jj) = if and only if 
v = /i. 

Let (£ n )neN be a sequence of 5-valued random variables. A rate function 
on <S is a lower semicontinuous function S — > [0, oo]. Let / be a rate function 
on <S. By lower semicontinuity, the sublevel sets of /, i.e., the sets 7 _1 ([0, c]) 
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for c € [0,oo), are closed. A rate function is said to be good if its sublevel 
sets are compact. 

Definition 2.1. The sequence (£ n )neN satisfies the large deviation principle 
with rate function / if for all B 6 B(S), 

- inf I(x) < liminf ilogP{^ n G B\ 

xeB° n->oo n 

< limsup-logP{£ n € B} < - inf I(x), 

n-¥oo n xecl(B) 

where cl(B) denotes the closure and B° the interior of B. 

Definition 2.2. The sequence (£ ra ) satisfies the Laplace principle with rate 
function / iff for all G G C b (S), 

1 



lim - 

n— >oo 



■ log E [exp {-n ■ G(C))} = inf {I(x) + G(x)} , 
n x^s 



where Cfe(iS) denotes the space of all bounded continuous functions S — > R. 

Clearly, the large deviation principle (or Laplace principle) is a distribu- 
tional property. The rate functio n of a large deviation princi ple is unique; 
see, for instance, Lemma 4.1.4 in iDembo and Zeitounil ([1998, p. 117). The 
large deviation principle holds with a good rate function if and only if the 
Laplace principle holds with a good rate functi on, and the rate func t ion is 
the same; see, for instance, Theorem 4.4.13 in IDembo and Zeitounil (|1998 , 
p. 146). 

The fact that, for good rate functions, the large deviation principle im- 
plies the Laplace pr inciple is a conse quence of Varadhan's integral lemma; 
see Theorem 3.4 in IVaradhanl ()1966l ). Another consequence of Varadhan's 
lemma is the first of the following two b asic t ransfer results, given here as 
Theorem EU cf. Theorem II. 7.2 in lEllisI dl98,4 p. 52). 

Theorem 2.1 (Change of measure, Varadhan). Let (£ n ) be a sequence of S- 
valued random variables such that (£ n ) satisfies the large deviation principle 
with good rate function L. Let (£ n ) n eN be a second sequence of S-valued 
random variables. Suppose that, for every n E N, Law(£ n ) is absolutely 
continuous with respect to Law(^ ra ) with density 



(f Law(£ r 



-(x) = exp (n • F(x)) , x G S, 



where F : S 



dLaw(£ r 

is continuous and such that 
1 



lim lim sup - logE [l [L>oo) (F(£ n )) ■ exp (n ■ F(£ n ))] 



-OO. 
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Then (£ n ) n eN satisfies the large deviation principle with good rate function 
I- F. 

The second basic transfer result is the contraction principle, given here as 



Theor em l2.2| see, for instance, Theorem 4.2.1 and Remark (c) in lDembo and Zeitouni 



(1998, pp. 126-127). 



Theorem 2.2 (Contraction principle). Let (£ n ) be a sequence of S -valued 
random variables such that (£ n ) satisfies the large deviation principle with 
good rate function I. Let ip: S — > y be a measurable function, y be a Polish 
space. If ip is continuous on I -1 ([0,oo)), then (^>(£ n )) satisfies the large 
deviation principle with good rate function 

J(y) = inf I(x), y ey, 

x€ip 1 {y) 

where inf = oo by convention. 

Let X\ , X<i , . . . be 5-valued independent and identically distributed ran- 
dom variables with common distribution /j, G V(S) defined on some prob- 
ability space (£l,J-, P). For n G N, let pJ 1 be the empirical measure of 
Xi, . . . , X n , that is, 



1 - 

1=1 

where <L denotes the Dirac measure concentrated in x G S. Sanov's theorem 



, Section 6.2 in ' 


Dembo and Zeitouni 


( 


1998 


Dupuis and Ellis 


(1997|, 39-52). Recall that 



V(S) is equipped with the topology of weak convergence of measures. 

Theorem 2.3 (Sanov). The sequence (/x™) n£ N of 'V(S)- valued random vari- 
ables satisfies the large deviation principle with good rate function 

1(9) =R(6\\n), 9eV(S). 

We are interested in analogous results for the empirical measures of 
weakly interacting systems. For N £ N, let X± , . . . , Xil be S- valued ran- 
dom variables defined on some probability space (ftjv, .F/V, P^r). Denote by 
fj, N the empirical measure of X^ , . . . , X$. 

Definition 2.3. The triangular array (^/ V )ivgN,jG{i,...,Af} i s called a weakly 
interacting system if the following hold: 
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(i) for each N € N, X^ , . . . , is a finite exchangeable sequence; 

(ii) the family (/j, n )n^ of "P(5)- valued random variables is tight. 

Recall that a finite sequence Y\, . . . , Y/v of random variables with values in 
a common measurable space is called exchangeable if its joint distribution is 
invariant under permutations of the components, that is, Law(Yi, . . . , Yj\r) = 
Law(Y^(i), . . . , Y a (N)) f° r every permutation a of {1, . . . , N}. A weakly in- 
teracting system (Xf) is said to satisfy the law of large numbers if there 
exists fi € V(S) such that (fi N ) converges to \x in distribution or, equiva- 
lently, (P^v °{f J ' N )~ 1 ) converges weakly to (L. Weakly interacting systems are 
sometimes called mean field systems. In the situation of Theorem 12, 3| set- 
ting Xf = Xi, N € N, i £ {1, . . . , N}, defines a weakly interacting system 
that satisfies the law of large numbers, the limit measure being the common 
sample distribution. 



3 A toy model and the desired form of the rate 
function 

For JV£N, let (5^ (f))i£{i j ... j jv},te{o,l} ^ e an independent family of standard 
normal real random variables on some probability space (ft, J 7 , P). Let b: 
R — > R be measurable; below we will assume 6 to be bounded and continuous. 
Define real random variables X± (t), . . . , X$(t), t 6 {0, 1}, by 

1 * 

(3.1) Xf (0) = Y/», Xf(l) = Xi v (0) + -^6(Xj v (0))+y 4 iV (l). 

J"=l 

We may interpret the variables Xf (t) as the states of the components of an 
iV-particle system at times t € {0, 1}. Let jj, N be the empirical measure of 
the iV-particle system on "path space," that is, 

N I N 1 N 

i=l i=l 

Notice that the components of are identically distributed and interact 
only through /j, N since 

JV 

-^6(Xf(0))= / b(x)dfx N (x,x) 

iV j=l K 2 

and the variables Y i (t) are independent and identically distributed. The 
sequence Xy , . . . ,XS of R 2 - valued random variables is exchangeable. 
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Let denote the empirical measure of Y N = (Y± , . . . , Y$). By Sanov's 
theorem, (A Ar )jvgN satisfies the large deviation principle with good rate func- 
tion i?(.||7o), where 70 is the bivariate standard normal distribution. Follow- 
ing the usual way of deriving the large deviation principle, we observe that, 
for every N 6 N, the law of fj, N is absolutely continuous with respect to the 
law of X N . To see this, set, for y,y € R N ', 9 G P(M 2 ), 

1 N f 

v {y,y) = JqUkyiSi)! m b( 9 ) = / b(x)d9(x,x), 
i=i J 

^J2s [yi) , m b (y) = (^J b(x)v£(dx),..., J b(x)v*{dx) 



N 

< N 

1 = 1 



Define functions /: V(M 2 ) xR^landF: V(R 2 ) ->■ [-00, 00) according to 

1 2 

f(0,(y,y)) = {y + m b (9)) -y- -\y + m b '°" ~ 



F{9) 



J R2 f(9, (y, y))d9(y, y) if f(0, .) is 0-integrable, 
—00 otherwise. 



Then the law of X N is absolutely continuous with respect to the law of Y N 
with density given by 

dLaw(X N ) ( _ 1 2 



(3>2) dLaw(^)" (y ' ^ = 6XP [ {y + mb{v) ^ ~ 2 lV + mb(y)l 

= exp (N ■ F (/^' ' 

Since fi N = ^^-jv( ) x N (i)) anc ^ ^ N = u (Y N (o)Y N (i))> f°U° ws from Equa- 
tion flOl that 



< 3 - 3 > ^£y«=^<™>< SeP(R2) ' 

The densities given by Equation fj3.3[) are of the form required by Theo- 
rem 12. 1| the change of measure version of Varadhan's lemma. Assume from 
now on that b is bounded and continuous. Then F is upper semicontinuous 
and the tail condition in Theorem 12. II is satisfied. However, F is discontinu- 
ous at any 9 6 T-^IR 2 ) such that F(9) > —00. Indeed, let 77 be the univariate 
standard Cauchy distribution and set n = (1 — —)9 + —5q (gljy, n G N. Then 
9 n ^f 9 weakly, while F{9 n ) = —00 for all n. 

Although Theorem 12. II cannot be applied directly, an approximation ar- 
gument based on Varadhan's lemma could be used to show (cf. Remark 13.11 
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below) that the sequence of empirical measures (/x^)jveN satisfies the large 
deviation principle with good rate function 

(3.4) I(9) = R(9\\ l0 )-F(9), 9eV(R 2 ). 



The function / in (|3.4p can be rewritten in terms of relative entropy as 
follows. Define a mapping ip: V(R 2 ) x M? — > R 2 by 

(3.5) iP(e,(y,y)) = (y,y + m b (e) + y). 

For 8 € V{R 2 ), let 9^(6) be the image measure of 70 under if)(9, .). Then 
^70 W i s equivalent to 70 with density given by 

^jM(y,y)=e W (f(9,(y,y)))- 
«7o 

If 9 is not absolutely continuous with respect to ^f 10 (9), then R(9\\^>y (9)) = 
00 = R(9\\jq). If 9 is absolutely continuous with respect to \]/ 7o (0), then 



lo g[ - )d 9-Jlo g[ ^ )d 9 
= R{9\\ 70 )-F(9). 
Consequently for all 9 € V(M?), 
(3.6) I(9)=R(9\\^ (9)). 

Notice that \£ 7o (9) is the law of a one-particle system with measure variable 
frozen at 9; ^^ (9) can also be interpreted as the solution of the McKean- 
Vlasov equation for the toy model with measure variable frozen at 9. 

Remark 3.1. A version of Varadhan's lemma (or Theorem 12. ip that allows 
to rigorously derive the large deviation principle for (/jL N ) with rate function 



i n rela tive entropy form is provided by Lemma 1.1 in iDel Moral and Zaiic 
( 20031 ). Observe that the density of La,w(X N ) may be computed with respect 



to product measures different from Law(y^) = ® N jo ■ A natural alternative 
is the product (g) \P 7o (a 4 *), where fi* is the (unique) solution of the fixed point 
equation fi = *$>^ ((i); ^* can be seen as the McKean-Vlasov distribution of 
the toy model. We do not give the details here. The results of Section [4J 
based on different arguments, will imply that (^^tvgn satisfies the large 
deviation principle with good rate function / as given by Equation fj3.6[) ; see 
Example 14.11 below. 
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Remark 3.2. Equation (|3.6p gives the desired form of the rate function in 
terms of relative entropy. More generally, suppose that ^ : V(S) — > V(S) is 
continuous, where S is a Polish space. Then the function 

J(6) = R(9\\*(9)), BeV(S), 

is lower semicontinuous with values in [0,oo], hence a rate function, and it 
is in relative entropy form. The lower semicontinuity of J follows from the 
lower semicontinuity of relative entropy jointly in both its arguments and the 
continuity of ^. If, in addition, range(^) = {^(0) ■ G V(S)} is compact in 
V(S), then the sublevel sets of J are compact and J is a good rate function. 
Indeed, compactness of range^) implies tightness, and the compactness of 
the sublevel sets of J, which are closed by lower s e micon tinuity, follows as 



in the proof of Lemma 1.4.3(c) in lDupuis and Ellisl (jl9971 . pp. 29-31). 



4 Noise-based systems 

Let X , y be Polish spaces. For N G N, let Jff , . . . , X% be ^-valued random 
variables defined on some probability space (fJjv> J~N, Pjv)- Denote by ji N the 
empirical measure of X± , . . . , X$ ■ We suppose that there are a probability 
measure 70 G V(y) and a Borel measurable mapping ijj : V(X) x y — > X such 
that the following representation for the triangular array (^fjj£{i ) ... > AT},jV'eN 
holds: For each N G N, there is a sequence Yj , . ..,Yjy of independent 
and identically distributed ^V-valued random variables on (Q/y, J~n, Pat) with 
common distribution 70 such that for all i G {1, . . . , N}, 

(4.1) X*(oj) = Tj)(fj,%,Yf(uj)), P N -almost all u G Sl N . 

The above representation entails by symmetry that, for N fixed, the sequence 
Xi,...,X*$ is exchangeable. Representation (|4.ip also implies that fi N 
satisfies the equation 

1 N 

(4.2) fj, N = — ^ S^Ny/f) Pat -almost surely. 

i=\ 

In order to describe the limit behavior of the sequence of empirical mea- 
sures (fj, N ) Ne jn, define a mapping V:V(y) x V(X) -> V(X) by 

(4.3) ( 7iM )h>.* 7 (m)=7°^~V,0- 

Thus tyjlp) is the image measure of 7 under the mapping y 3 y 1— >■ ifj(fj,,y). 
Equivalently, ^ / 7 (/x) = Law(i/>(/i, Y")) with Y any ^-valued random variable 
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with distribution 7. Limit points of (fi N )]\f^ will be described in terms of 
solutions to the fixed point equation 

(4.4) [L = *». 

Assume that there is a Borel measurable set V C V(y) such that the 
following properties hold: 

(Al) Equation (14. 4[) has a unique fixed point fJ,*('y) for every 7 G X>, and the 
mapping X> 9 7 1— )• /U*(7) € V{X) is Borel measurable. 

(A2) For all JVeN, 

®% j(yi,...,2/jv) e^: ^E*« eP } = L 

(A3) If 7 G is such that -R(tIIto) < °°i then 7 G £> and fj,*m is 

continuous at 7 . 

Assumption (A|2j) implies that Equation (|4,4p possesses a unique solution 
for almost all (with respect to products of 70) probability measures of em- 
pirical measure form. Such probability measures are therefore in the domain 
of definition of the mapping n^x>- According to Assumption (A|3|), also all 
probability measures 7 with finite 70-relative entropy are in the domain of 
definition of /x*ro, which is continuous at any such 7 in the topology of weak 
convergence. 

Theorem 4.1. Grant (Al) - (A3). Then the sequence (//^tvgN satisfies the 
large deviation principle with good rate function I : V{X) — > [0, 00] given by 

I(rj) = inf R(j\\jo), 

7e£>:M7)=f? 

where inf = 00 by convention. 

Proof. The assertion follows from Sanov's theorem and the contraction prin- 
ciple. To see this, let denote the empirical measure o£Yf,...,Y$. Then 
for Pjv-almost all oj G £In, 



(4-5) ^ = i£ = o^(^,.) = * x »(^). 

i=l 

Thus fi N = ^ x n(h n ) with probability one. For P^r-almost all u G Qn, ^ £ 
V by Assumption (A[2J) and, by uniqueness according to (A[T]l, /i*(A^) = fi^ . 
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By Theorem l2,3l (Sanov). (X n )ngN satisfies the large deviation principle with 
good rate function ||To) - By Assumption (A|3j) , [J>*(-) is defined and con- 
tinuous on {7 £ V{y) : -R(tIIto) < °°}- Theorem 12.21 (contraction principle) 
therefore applies, and it follows that (//*(A^ v ))jvgn, hence (/U^jveN) satisfies 
the large deviation principle with good rate function 

V(X)Bt]^ inf i?(7||7o). 

□ 

The rate function of Theorem 14.11 can be expressed in relative entropy 
form as in Remark 13.21 The key observation is the contraction property of 
relative entropy established in Lemma lA.ll in the appendix. 



Corollary 4.2. Let I be the rate function of Theorem \4-l\ Then for all 
V € V{X), 

I( V ) = R(r,\\^ ( V )). 

Proof. Let rj £ V(X). The mapping V(y) 974 %{r]) £ V(X) is Borel 
measurable. Since {7 £ V(X) : llTo) < °°} C T> and inf = 00, 

inf -R(tIIto) = inf ^(tIIto) = inf ^(tIIto)- 

By Lemma lA.ll it follows that 



■yeV(y):^j(ri)=ri 



□ 



Example 4.1. Consider the toy model of Section [3] Suppose that b £ C7ft(R). 
Then 9 1— > mj,(0) = J b(x)dO(x, x) is bounded and continuous as a mapping 
"P(R 2 ) — > R. Observe that m{,{9) depends only on the first marginal of 9. Set 
X = R 2 , 3^ = R 2 , let 70 be the bivariate standard normal distribution, and 
define ip : V(M. 2 ) x R 2 -> R 2 according to ([33)1 . Recalling Equation ([33)1 . one 
sees that the toy model satisfies representation (|4.ip . Based on ip, define ^ 
according to (14. 3p . Given any 7 £ "P(R 2 ), the mapping \i 1— > ^(/x) possesses 
a unique fixed point //*(7). To see this, suppose that 9 £ "P(R 2 ) is a fixed 
point, that is, 9 = ¥ 7 (0) = T o^,.)- 1 . Let X = (X(0),X(1)), y = 
(y(0),y(l)) be two R 2 -valued random variables on some probability space 
(fi, J-, P) with distribution 9 and 7, respectively. By the fixed point property, 
Law(X) = Law(^(0,y)). By definition of ip, Lawpf(O)) = Law(y(0)). 
Since mb(9) depends on 9 = Law(X) only through its first marginal, which 
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is equal to Law(X(0)) = Law(Y(0)), we have rrib{9) = m&(7). It follows 
that, for all Bq, B\ G B(R), 



P (X(l) G Bi|X(0) G5 ) = P (^(0) + m 6 ( 7 ) + y(l) € fli|Y(0) G B ) . 

This determines the conditional distribution of X(l) given X(0) and, since 
Law(X(0)) = Law(Y(0)), also the joint law of X(0) and X(l). In fact, 
Law(X) = v I / 7 (7). Consequently, /Li* (7) = ^7(7) is the unique solution of 
Equation (|4,4p . By the extended mapping theorem for weak convergence 



(Theorem 5.4 in iBillingslevl . Il968l . p. 34) and since m b (.) G Cb(V(M. 2 )), the 
mapping (7, fx) 1— > ^7(7) is continuous as a function "P(R 2 ) x 'P(M. 2 ) — > 
V(R 2 ). It follows that the mapping 7 1— > = ^7(7) is continuous. 

Assumptions (Al)-(A3) are therefore satisfied with the choice T> = V(M. 2 ). 
By Corollary 14. 2 1 the sequence of empirical measures (fi N ) for the toy model 
satisfies the large deviation principle with good rate function / given by 
(|3.6p . Observe that the distribution 70 need not be the bivariate standard 
normal distribution for the large deviation principle to hold; it can be any 
probability measure on B (R 2 ). 

Example 4.2. Consider the following variation on the toy model of Section [3] 
and Example 14.11 For iV £ N, let (^~' V (£))ie{i,...,7V},te{0,i} be independent 
standard normal real random variables as above. Denote by 70 the bivariate 
standard normal distribution and let B G B(M) be a 70-continuity set, that 
is, 7o(9(B x R)) = 0, where d(B x R) is the boundary of B X R. Define real 
random variables X^(t), ... , X$(t), t G {0, 1}, by 

x? (o) = y^(o), x? (1) = x? (o) + 1 (fj iB(xf (o))J • y^(i). 

For this new toy model, define ip: "P(R 2 ) x R 2 — > R 2 by 

^(m> (y ; y)) = (y. y + x M ) • v)- 

With this choice of ip, representation (|4.ip holds and -0 is measurable as 
composition of measurable maps since /i 1— > x 1) is measurable with 
respect to the Borel cr-algebra induced by the topology of weak convergence. 
Based on ip, define \E' according to (|4.3p . As in Example 14, 1| one checks that 
the fixed point equation (|4,4p possesses a unique solution /i*(7) = ^7(7) for 
every 7 G "P(R 2 ). However, if d(B x R) / f), then //*(.) is not continuous on 
V(R 2 ). On the other hand, if 7 G P(R 2 ) is such that R(j\\jo) < 00, then 7 is 
absolutely continuous with respect to 70, so that B x R is also a 7-continuity 
set. By the extended mapping theorem, it follows that //*(.) is continuous at 
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any such 7. Assumptions (Al) - (A3) are therefore satisfied, again with the 
choice T> = ^(M 2 ), and Corollary 14,21 yields the large deviation principle. In 
this example, if 70 (-B x R) < 1, then the distribution of /x , the empirical 
measure of X± , . . . ,X$, is not absolutely continuous with respect to X N , 
the empirical measure of Y^ , . . . , Yff. 

Example 4.3 (Discrete time systems). Let T G N. Let Xq, be Polish 
spaces, and let X, y be the Polish product spaces X = (Xq) t+1 and y = 
(y ) T , respectively. Let <p : y -»• X , <p: {0, . . . , T-l} x V(X ) x y ^ X 
be continuous maps. Let 70 G P(3^) an d, for A G N, let Y± , . . . , Yff 
be independent and identically distributed 3^-valued random variables de- 
fined on some probability space (Qn,J-n,Pn) with common distribution 
7o- Write Y^ = (Y^ (t))t£{o r— 1} anci define Af-valued random variables 
Xf, ...,X% with X" = (Xf (t)) te{0 ,...,T } recursively by 

(4g) Xf(0) = ^>)), 

Af(t + l) = ^(t, / u Af (t),y i Ar (t + l)), tG{0,...,T-l}, 

where ^ N (t) = ^xf(t) * s ^ ne em pi r ical measure of X^, . . . ,X$ at 

marginal (or time) t. In analogy with (|4,6p . define ip: V(X) x y — > X by 

^(/*, (j/o, • • • ,?/t-i))(o) = m(yo), 

^(^{y ,...,y T -l))(t+l) = (p(t,^t),y t+ i), t G {0,...,T-1}, 

where /i(i) is the ^-marginal of /i. Then ^ is measurable as a composition of 
measurable maps, and representation (|4.ip holds. Based on tjj, define ^ ac- 
cording to (|4.3p . Using the recursive structure of (|4.6p and the components of 
tp, one checks that the fixed point equation (|4.4p has a unique solution /u*(7) 
given any 7 G V(y). By the extended mapping theorem and the continuity 
of (po and 92, it follows that /x*(.) is continuous on T> = 7 7 (3^)- Consequently, 
Corollary 14.21 is applicable and yields the large deviation principle for the 
sequence of "path space" empirical measures (fi N )NeN- 

Remark 4.1. Example 14.31 covers a large class of discrete time weakly in- 
teracting Markov systems. The (A^^-valued sequence X (0), . . . , X (T) 
given by (|4.6p enjoys the Markov property if the (yo) N -valued random vari- 
ables Y (0), . . . , Y N (T— 1) are independent. The continuity assumption on 
ipo, (p can be relaxed, for instance in the way indicated in Example 14.21 



5 Weakly interacting Ito processes 



In this section, we consider weakly interact ing systems described by Ito pro- 
cesses as studied in lBudhiraja et al.l (|2012[ ). We show that the Laplace prin- 
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ciple rate function derived there in variational form can be expressed in 
non-variational form in terms of relative entropy. We do not give the most 
general conditions under which the results hold; in particular, we assume 
here that all particles obey the same deterministic initial condition. 

Let T > be a finite time horizon, let d, d\ G N, and let x§ G W 1 . Set X = 
C([0,r],R*), y = C([0,T],M' il ), equipped with the maximum norm topol- 
ogy. Let 6, a be predictable functional^ defined on [0, T] x X x V(M. d ) with 
values in M. d and M. dxd \ respectively. For N G N, let ((n N ,J cN ,P N ), (T t N )) 
be a stochastic basis satisfying the usual hypotheses and carrying N indepen- 
dent (fi-dimensional (J 7 ^))- Wiener processes Wf, ...,W$. The TV-particle 
system is described by the solution to the system of stochastic differential 
equations 



(5.1) dX t N (t) = b(t,X t N (.), fJ , N (t))dt + a(t,X l N (.), f , N (t))dW t N (t) 



with initial condition X^(0) = xq, i G {1,..., ./V}, where fi N (t) is the em- 
pirical measure of , . . . ,X^ at time t G [0, T], that is, 



i=l 

Denote by fi the empirical measure of (X^ , . . . ,X$) over the time 
interval [0,T], that is, /j, N is the V{X)-vah\ed random variable defined by 



i=i 

The asymptotic behavior of [i N as N tends to infinity can be characterized 
in terms of solutions to the "nonlinear" stochastic differential equation 

(5.2) dX{t) = b(t, X(.), Law (X(t)))dt + a (t,X(.), Law (X(t)))dW(t) 

with Law(X(0)) = 5 XQ , where W is a standard d\ -dimensional Wiener pro- 
cess defined on some stochastic basis. Notice that the law of the solution 
itself appears in the coefficients of Equation (|5.2p . The corresponding Kol- 
mogorov forward equation is therefore in general a nonlinear parabolic partial 
differential equation, and it corresponds to the McKean-Vlasov equation of 
the weakly interacting system defined by (|5.ip . 

For the statement of the Laplace principle, we need to consider controlled 
versions of Equations (|5.ip and (|5.2p . respectively. For JVgN, let Um be the 
space of all (J 7 t Ar )-progressively measur able functions u : [0, T] x Q N -> R Nxdl 





such that 
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where E^r denotes expectation with respect to P^v- Clearly, with u = 
(itl, . . . , lt/v), \u(t)\ 2 = Y2i=i \ u i(t)\ 2 - Given u G Un, the counterpart of 
Equation (|5.ip is the system of controlled stochastic differential equations 

dXf(t) = b{t,Xf(.)^ N (t))dt + a{t,Xt l (.),fi N (t))u l (t)dt 
1 ' j +a(t,X»(.),fi N (t))dW t N (t), 

with initial condition X^(0) = xq, where fi N (t) denotes the empirical mea- 
sure of , . . . , X$ at time t. 

Let U be the set of quadruples ((0, J 7 , P), (J" t ),u, W) such that the pair 
((£1, J-,P), (Ft)) forms a stochastic basis satisfying the usual hypotheses, 
W is a di-dimensional (J-f)-Wiener process, and u is an M rfl -valued (Ft)- 
progressively measurable process with J \u(t)\dt < oo P-almost surely. For 
simplicity, we may write u £U instead of ((17, J 7 , P), (J~t), u, W) G U. Given 
u G U, the counterpart of Equation (|5.2p is the controlled "nonlinear" stochas- 
tic differential equation 

dX(t) = b(t, X(.), Law (X(t)))dt + <r(t, X(.), Law (X(t)))u(t)dt 
1 ' ' ' + cr(t, X(.), Law (X(t)))dW{t) 

with initial condition Law(X(0)) = 5 Xo . A solution of Equation (|5.4p under 
u G U is a continuous R -valued process X defined on the given stochastic 
basis and adapted to the given filtration such that the integral version of 
Equation (|5.4p holds with probability one. Denote by *R.\ the space of de- 
terministic relaxed controls with finite first moments, that is, T^i is the set 
of all positive measures on B(R dl x [0, T]) such that r(M. dl X [0, t]) = t for 
all t G [0, T] and J K d lx [ 0T ] \y\r(dy x dt) < oo. Equip TZi with the topology 
of weak convergence of measures plus convergence of first moments. Let 
u G U. The joint distribution of (it, W) can be identified with a proba- 
bility measure on B(TZ\ x y). If X is a solution of Equation (|5.4p under 
it, then the joint distribution of (X, u, W) can be identified with a prob- 
ability measure on B(Z), where Z = X x Tt\ x y. Weak uniqueness of 
solutions is said to hold for Equation (|5.4p if, whenever u,u GW and X, X 
are two solutions of Equation (|5.4p under u and u, respectively, such that 
PoI(O)- 1 = PoI(O)- 1 , then P o(«, W^X)- 1 = P o (u^ W^X)- 1 as prob- 



ability measures on B(1Z\ x y x Af); cf. Definition 5.1 in iBudhiraia et al. 



(120121 ). Notice that here we give a process version of what can be equiva- 
lently formulated in terms of probability measures on B(Z). 

The Lap lace principle given in Theorem 15.11 below is a version of The- 



orem 7.1 in iBudhiraja et al.l (120121 ); also cf. Theorem 3.1 and Remark 3.2 
there. The following assumptions are sufficient for the Laplace principle to 
hold: 
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(HI) The functions b, a are continuous, and b(t, ., .), a(t, ., .) are uniformly 
continuous and bounded on sets B x P whenever B C X is bounded 
and P C P(R ) is compact, uniformly in i G [0, T]. 

(H2) For all N G N, existence and uniqueness of solutions holds in the strong 
sense for the system of N equations given by (|5.ip . 



(H3) Weak uniqueness of solutions holds for Equation (|5.4p , 
(H4) If u N &U N , N G N, are such that 

rT 



sup E 



1 N 

1=1 



uf{t)\'dt 



< oo, 



then {fi N : N G N} is tight as a family of ■p(Af)-valued random vari- 
ables, where p, N is the empirical measure of the solution to the system 
of equations (|5.3p under u . 



Let U be the set of quadruples ((Q, T, P), (J^t), u, W) G U such that 
E J Q T \u(t)\ 2 dt < oo. 

Theorem 5.1 fjBudhiraia et al.1 (120121 ^. Assume (H1)-(H4). Then the se- 
quence (// Ar )iVGN of V{X) -valued random variables satisfies the Laplace prin- 
ciple with rate function I : V(X) — > [0, oo] given by 



1(6) = inf E 

ue«:Law(X")=0 



\u(t)\ 2 dt 



where X u is a solution of Equation (|5.4p over the time interval [0, T] with 
Law(X(0)) = 5 XQ , and inf = oo by convention. 



Remark 5.1. The function / of Theorem 15.11 is indeed a rate function, that 
is, I is lower semi continuous with values in [0, oo]. The following hypothesis, 
which is analogous to the stability condition (IS]), is sufficient to guarantee 
goodness of the rate function. 



(H') If (u n ) n( zfi C U is such that sup, 
{Law(X"' 1 ) : n G N} is tight in V(X) 



tf\u n (t)\ 2 dt 



< oo, then 



Under this additional assumption, I is a good rate function and the Laplace 
principle implies the large deviation principle. 

Consider the special case in which d = d±, xq = 0, 6 = 0, and a = kLj. 
In this case, X = y and /j, N is the empirical measure of N independent 
Wiener processes W± , . . . , W$ . Let 70 be Wiener measure on B(y). Since 
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= 70, Sanov's theorem implies that the sequence (/i )jveN satisfies 
the large deviation / Laplace principle with good rate function i?(.||7o). On 
the other hand, by Theorem 15.11 (h N )n£N satisfies the Laplace principle 
with rate function 



J( 7 ) = inf E 

«GW:Law(y u )=7 

where Y u is the process given by 



\u(t)\ 2 dt 



i&r(y), 



(5.5) 



Y u (t) 



u(s)ds + W(t), t£[0,T\. 



One checks that J: V(y) — > [0, oo] has compact sublevel sets, hence is a good 
rate function. It follows that J coincides with the rate function obtained from 
Sanov's theorem. Consequently, for all 7 E V(y), 

rT 



(5.6) 



R(j\\jo) = inf_ E 

ueW:Law(y u )=7 



1 



\u(t)\ 2 dt 



Remark 5.2. Equation (|5.6p provides a "weak" variational representation of 
relative entropy with respect to Wiener measure. In Appendix [Bj we give 
a direct proof of Equation (|5.6p . The variational representation is weak in 
the sense that the underlying stochastic basis may vary. In particular, the 
control process u may be adapted to a filtration that is strictly bigger than 
the natural filtration of the Wiener process. Notice that expectation in (|5.6p 
is taken with respect to the probability measure of the stochastic basis that 
comes with the control process u. 

Remark 5.3. Rep resentati o n (15. 6 p may be compared to the following re- 
sult obtained by lustiinell (|2009l ). Take as stochastic basis the canonical 
set-up; in our notation, ((y, B(y), 70), (Bt)), where (Bt) is the canonical fil- 
tration. Let W be the coordinate process. Thus W is a di-dimensional 
Wiener process under 70 with respect to (Bt). Let u be an R rfl -valued (Bt)- 
progressively measurable process such that E 70 \u(t)\ 2 dt < 00. Con- 
sider Y u = f Q u(s)ds + W(.). Since Y u (.,u) = f Q u(s,uj)ds + w(.) for all 
UJ £ y, Y u induce s a Borel measu rable mapping y — > y. Set 7 = 7oo(y") -1 . 
By Theorem 8 in ltJstunell (hoooh . 



(5.7) 



i?( 7 ||7o) < E 
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\u(t)\ 2 dt 



Assume in addition that u is such that 

rT 

exp 



E 







u(t) ■ dW(t) - - 



\u(t)\ 2 dt 
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and that, for some M. dl -valued (i3()-progressively measurable process v, 



Jl = exp^-^ v (t).dW(t)-~J \v(t)\ 2 d?j 70-a.s. 



Theorem 7 in lUstiinell (J2009J) then states that equality holds in (|5.7[) if and 
only if Y u is 70-almost surely invertible as a mapping y — > y with inverse 
Yv = f v(s)ds + W(.). 

Let us return to the general case. Given 9 G V(X), denote by 9(t) 
the marginal distribution of 9 at time t, that is, the image of 9 under the 
projection X — >• M d on coordinate t. Consider the stochastic differential 
equation 

(5.8) dX(t) = b(t,X(.),9(t))dt + a(t,X(.),9(t))dW(t). 

Equation (|5.8p results from freezing the measure variable in Equation (|5.2p 
at 9. We will assume existence and pathwise uniqueness for Equation (|5.8p . 

(H5) Given any 9 G V(X), weak existence and pathwise uniqueness hold for 
Equation ff5T5]) . 



Based on representation (|5.6p and the contraction property of relative 
entropy, the rate function of Theorem 15.11 can be shown to be of relative 
entropy form. 

Theorem 5.2. Assume (HI) - (H5). Then the rate function I of Theo- 
rem \5.1\ can be expressed in relative entropy form as 

i{d) = R{e\mo)), eev(x), 

where ^(9) is the law of the unique solution of Equation (|5.8p under 9 over 
the time interval [0, T] with initial condition X(0) = xq. 

Remark 5.4. Assumption (HI]) can be weakened by requiring weak existence 
and pathwise uniqueness of solutions to Equation (|5.8p only for 9 6 'P(X) 
such that 1(9) < 00. Those measures 9 are, by definition of /, distributions 
of Ito processes. The function ^ introduced in Theorem 15.21 would then be 
defined only on the effective domain of /; for 9 G V(X) with 1(9) = 00, one 
can then choose ^(9) in such a way that 9 is not absolutely continuous with 
respect to ^(9) (for instance, by choosing between two Dirac measures). 

Proof of Theorem \5.°A Let 9 G V(X). By hypothesis, weak existence and 
pathwise uniqueness hold for Equation (|5.8p . By a result originally due to 
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Yamada and Watanabe (cf. iKallenberd . Il996l ). there is a Borel measurable 
mapping tpg : M. d x y — > X such that 



(5.9) 



ipo{xQ,W) = X P -almost surely 



whenever X is a solution of Equation (|5.8p under 9 over the time interval 
[0, T] with initial condition X(0) = xq on some stochastic basis ((17, J 7 , P), (J^t)) 
carrying a cfi-dimensional Wiener process W. For such a solution, ^(0) = 
Law(Y) by definition. Set ifjg(.) = ijjg(xo, •), and let 70 be Wiener measure 
on B(y). By Equation *(0) = ^(70) = 7o°V , e " 1 - By Lemma lA~Tl the 

contraction property of relative entropy, and representation (|5.6p it follows 
that 

R{0\\*(9)) = R(0\\Mlo)) 

= inf R(j\\jo) 
7£V(yy.Mi)=e 



inf 



inf E 

«eW:Law(y u )=7 
(■T 



\u{t)\ 2 dt 



inf E 

uGU:ha.w(tP e (y u ))=e 



\u(t)\ 2 dt 







where Y" is defined by (|5.5p . Let u £W, and set Y M = tpg{Y u '). Then, as a 
consequence of Equation (|5.9p , X u solves 

dX(t) = b(t,X(.),0(t))dt + a(t,X(.),6(t))u(t)dt 

+ <j(t,X(.),9(t))dW(t) 

with initial distribution 5 XQ . If u is such that Law(^g(Y")) = 9, then Y u 
is a solution of Equation (|5.4p under u with initial distribution 5 XQ . By As- 
sumption (HE]), weak uniqueness holds for Equation (|5.4p . hence Law(Y") = 
Law(Y") whenever X u is a solution of (|5.4p under u with Law(X"(0)) = <5 X0 . 
It follows that 

rT 

|2. 



R{0\\*(9)) 



inf E 

ueW:Law(V>e(T u ))=0 



|n(t)Tdt 



inf _ 

«£W:Law(X u )= 



E 



|n(t)| 2 dt 



where / is the rate function of Theorem 15.1 



□ 



Remark 5.5. Assuming in addition to (H1)-(H5) hypothesis (FT) of Re- 
mark [571] Theorem 15.21 can be proved by applying both Sanov's theorem and 
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Theorem 15.11 to the weakly interacting system given by equations (|5,ip with 
measure variable frozen at 6 € V{X) and then evaluating the resulting rate 
functions at 8. 



A Contraction property of relative entropy 

Let X, y be Polish spaces. Denote by II^, Uy the collection of all finite and 
measurable partitions of X and y, respectively. Recall that relative entropy 
can be approximated in terms of finite sums; for r], v € V(X), 

(A.l) R( v \\u) = sup £ 17(A) log , 

see, for instance, Lemma 1.4.3(g) in bupuis and Ellis ( 1997 . p. 30). For 
ip : y — > X measurable, 7 £ V(y), denote by ^(7) =70 ip~ 1 the image 
measure of 7 under ip. 

The following lemma extends Lemma E.2.1 in iDupuis and Ellisl (J1997, 
p. 366), where the case of bijective bi-measurable mappings is treated, to 
arbitrary measurable transformations. 

Lemma A.l. Let tp : y — > X be a Borel measurable mapping. Let r/ E V(X), 
70 G V(y). Then 

(A.2) R(vUM) = inf R(M\lo), 

where inf = 00 by convention. 

Proof. Suppose 7 € V(y) is such that V(t) = V- Then, by (1A.1[) and the 
definition of image measure, 

V (A) 



R{v\\i>('yo)) = su p ^2v( A )^g 



V 7 o(^- 1 (A)) 

= su P y: 

< sup ^ 7 (b) log (im 



'y Beir 

= ^(tIIto) , 

where ?/; -1 (7r) denotes the partition of 3^ induced by the inverse images of ip. 
More precisely, tp -1 ^) = {ij)~ l (A) : A G ir}. Notice that ip~ l {i{) is indeed 
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a finite and measurable partition of y since ir is a finite and measurable 
partition of X, inverse images under tp are Borel measurable and tp~ 1 (A) n 
t/j~ 1 (A) = whenever A n A = 0. Since inf = oo, it follows that 

^(^IWTo)) < inf J?(7||7o). 

If i?(?7||'0(7o)) = 00, then the above inequality is necessarily an equality, 
namely 00 = 00. Thus in order to show the opposite inequality, we may 
assume that R(r]\\ip(^o)) < 00. Now R(r}\\ip('fo)) < 00 implies that rj is 
absolutely continuous with respect to ^(70), hence possesses a density / = 



dr] 



*/>(7o) 



. Set 



7 (C) = / f&(y)) l0 (dy), ceB(y). 

Jc 



'C 

Then 7 is a probability measure having density / o tp with respect to 70. 
Using the integral transformation formula and definition of /, we have for 
all A G B(X), 



y 

f 

y 

f 

x 

which means that VKt) = V- Recalling that / o tp = / — — ^ 



tP( 7 )(A) = ll^ (A) (y)-f(tp(y)) 10 (dy) 

l A (tP(y))-f(tP(y)) l0 (dy) 

1a(x) ■ f(x)tp(-j )(dx) 
V(A), 

111 n rr fli 0+ r\ nly , . „ 

R{l\\lo)= [ f^(y))\og{f{tP(y))) lQ {dy) 

Jy 

= [ f(x)\og(f(x))tP( Jo )(dx) 
Jx 

= R( V \\tP( l0 )), 

which proves inequality ">" in ()A.2p . □ 

The proof of Lemma IA.1I shows that the probability measure 7 defined 
by j(dy) = 

d$cio) (^(2/))To(^2/) attains the infimum in (|A.2|) whenever that 
infimum is finite. 



B Relative entropy with respect to Wiener measure 

Let y be the Polish space C([0,T],IR d ) equipped with the maximum norm 
topology. Let tl be defined as in Section [5] with d\ = d. Thus U is the set of 
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quadruples ((17, F, P), (J r t),u,W) such that the pair ((O, J 7 , P), (J 7 *)) forms 
a stochastic basis satisfying the usual hypotheses, W is a (f- dimensional ( Ft)- 
Wiener process, and u is an M. d - valued (Ft )-progressively measurable process 
Jq \u(t)\ 2 dt < oo. Given u (zU, define Y u according to (|5.5p . that 



with E 
is. 



Y u (t) = W(t) + / u(s)ds, t€[0,T}. 



The following result provides a variational representation of relative entropy 
with respect to Wiener measure. 



Lemma B.l. Let 70 be Wiener measure on B(y). Then for all 7 G Viy), 

\u(t)\ 2 dt , 



(B.l) 



RhW^o) = m f_ E 

uGW:Law(T")=7 



where inf = 00 by convention. 

The proof of inequality " <" in (IB. 1|) relies on the lower semi continuity of 
relative entropy and the Donsker-Varadhan variational f ormula; it may be 
confro nted to the first part of the proof of Theorem 3.1 in lBoue and Dupuis 
(|1998l ). The proof of inequal ity ">" exploits the y ariational formulation and 
uses arguments contained in iFollmerl (|1985l . Il986l ). 

Proof of Lemma \B.ll In order to prove inequality " <" in (IB.ip , it suffices to 
show that, for all u (zU, 



(B.2) 



i?(Law(Y u )|| 7 o) < E 



\u(t)\ 2 dt 



Let u € U, an d set 7 = Law(y") = P o{Y u )~ 1 . In accordance with Def- 
inition 3.2.3 in iKaratzas and Shrevd (|199ll . p. 132), a process v defined on 
((0, T, P), (Ji)) is called simple if there are N G N, = t < . . . < t N = T, 
and uniformly bounded M. d - valued random variables £o,...,£n such that £j 
is Ft -measurable and 

N 

V (t,u) = e (w)i{ }(t) + 

By Proposition 3.2.6 in Karatzas and Shrevel ( 199ll . p. 134), there exists a 



sequence (v n ) n ^ of simple processes such that E L \u(t) — v n (t)\ 2 dt 







as n — > 00. Let (v n ) n ^ be such a sequence. For n G N, set 7„ 
Then 7 n — > 7 in V(y) since 

|2, 



E 



sup \Y u (t) -Y v "(t)\ 2 dt 
te[o,T] 



< T-E 



|u(t) - u n (t)r<it 



Law(Y^). 



0. 
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Therefore, by the lower semicontinuity of i?(.||7o), 

i?(Law(y u )|| 7 o) =Rh\ho) <liminf J R( 7n ||7o) = liminf i2(Law(F^)|| 7o ). 



On the other hand, E 



E 



i r T 



Jo Ht)?dt 



as n —> oo. It 



is therefore enough to show that (IB, 2ft holds whenever u is a simple process. 
Thus assume that u is simple. Let Z be the J^-measurable (0, oo)-valued 
random variable given by 



Z = exp 



u(s)-dW(s) 



\u(s)\ ds 



Notice that E[Z] = 1 since u is uniformly bounded. Define a probability 
measure P on (O, JFt) by 

dp 

By Girsanov's theorem (Theorem 3.5.1 in lKaratzas and Shrevd . il 99 ll . p. 191), 
Y u is an (J^)- Wiener process with respect to P. By the Donsker-Varadhan 



varia tional formula for relative entropy (Lemma 1.4.3(a) in lDupuis and Ellis 



19971 . p. 29), 



(B.3) R{7\ho)= sup { [ g(y)-y(dy)-log [ l0 (dy)\ . 

gec b (y) Uy Jy J 

Recall that 7 = P o(y u )~ 1 and 70 = P 0W 1 , but also 7o = Po(y")" 1 since 
Y u is a Wiener process under P. Let E denote expectation with respect to 
P. By the convexity of — log and Jensen's inequality, for all g E Cb(y), 



g{y) l(dy) - log / e 9iy) j {dy) 



E[g(Y u )] -logE[exp( 5 (W0)] 
E[g(Y u )] - log E [exp {g(Y u ))] 
E [g(Y u )] -logE[exp( 5 (y u )) ■ Z] 



E[g(Y u )] -logE 



< E [g(Y u )] - E 



exp ( g(Y l 

T 



u(t) ■ dW(t) - - 



\u{t)\ 2 dt 



9(Y U ) 



u(t) ■ dW{t) 







\u(t)\ 2 dt 



E 



\u(t)\ 2 dt 



since E 



inequality (IB, 2ft follows. 



Jq u(t) • dW(t) = as u is square integrable. In view of ()B.3p . 
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In order to prove inequality ">" in (IB . 1 [) . it suffices to consider proba- 
bility measures with finite relative entropy with respect to Wiener measure. 
Let 7 G V(y) be such that llTo) < 00 • m particular, 7 is absolutely 
continuous with respect to 70. We have to show that there exists u G IA 
such that Law(Y u ) = 7 and i?(7||7o) > E ± / Q T \u(t)\ 2 dt . Let Y be the 
coordinate process on the canonical space (y,B(y)), and let (Bt)t^[o,T] t> e 
the canonical filtration (the natural filtration of Y). Denote by (fit) the 
70-augmentation of {Bt). Both 70 and 7 extend naturally to Bt D B(y). 
Clearly, Y is a (St)-Wiener process under 70. Since llTo) < °°> there is 
a [0, oo)-valued S-r-measurable random variable £ such that 

p- = e, e 70 [e] = 1, e 7 [| io g (oo = e 70 [| io g (oie] < oo. 

«7o 

Set Z(i) = E 7o [^|iSt], t G [0, T]. By a version of Ito's ma r tinga le repre- 



sentation theorem (Theorem III. 4. 33 in iJacod and Shirvaevl . 120031 . p. 189), 



there exists an valued (jBt)-progressively measurable process v such that 



7o(J T \v (t)\ 2 dt < 00) = 1 and 



(B.4) Z(t) = l+ I v(s) ■ dY(s) for all t G [0,T], 7o -a.s. 

Jo 

In particular, Z is a continuous process. By the continuity and martingale 
property of Z, and since Z(T) = £, 



7 inf > = 1. 

\te[o,T] 



Define an M. -valued (^^-progressively measurable process u by 
(B.5) u(t) = — . v {t)-l 

{inf s g[o,t] z ( s 

)>o}, te[0,T]. 

Thus n(t) = v(t)/Z(t) 7-almost surely. Applying Ito's formula to calculate 

stochastic differentials, one checks that 

(B.6) 

Z(t) = exp (J^ u(s) ■ dY(s) - ^ J \u{s)\ 2 dt^j for all t G [0,T], 7-a.s. 

Set Y(t) = Y(t) - f*u(s)ds, t G [0,T]. Then Y is a (^)-Wiener process 
with respect to 7. Clearly, Y is continuous and (23^)- adapted. Since 7 is 
absolutely continuous with respect to 70, the quadratic covariation processes 
of Y are the same with respect to 70 as well as to 7. Since j u(t)dt is a pro- 
cess of finite total variation with 7-probability one, it follows that Y has the 
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same quadratic covariations under 7 as Y under 70. I n view of Levy's char- 
acter ization of the Wiener process (Theorem 3.3.16 in lKaratzas and Shrevd . 
199ll . p. 157), it suffices to check that Y is a local martingale with respect to 
(&t) and 7. But this fol lows from the version of Gi rsanov's theorem provided 
by Theorem III. 3. 11 in lJacod and Shirvaevl ()2003l . pp. 168-169) and the fact 
that, thanks to (|B,4p . the quadratic covariations of the continuous processes 
Yi, i G {1, . . . , d}, and Z are given by 

[Yi, Z] (t) = (Yi, Z) (t) = f Vi (s)ds for all t G [0, T], 7o -a.s., 

J 

and v (t) = u(t) ■ Z(t) 7-almost surely. For n G N, define a (^)-stopping 
time T n by 

T n = inf |t > : J \u(s)\ 2 ds > n| A T. 



Set 



£, n = exp 



u(s) ■ dY(s) 



\u(s)\ 2 ds . 



Then £ n is well-defined with £ n > 70-almost surely (hence also 7-almost 
surely). By Novikov's criterion (Corollary 3.5.13 in iKaratzas and Shrevd . 
199lL p. 199) and the version of Girsanov's theorem cited in the first part of 
the proof, 

din . 



10 



in 



defines a probability measure 7 n which is equivalent to 70. As a consequence, 
7 is absolutely continuous with respect to j n with density given by It 
follows that 



fl( 7 ||7o) = E 7 [log(0] 



l0g 1 I 



+ e 7 pog(e„)] 



RfyWln) +E 7 
R(j\\j n ) +E 7 
J R(7||7„) +E 7 



1 



u(s)-dY(s) - - I \u(s)\ 2 ds 
2 J 



u(s) ■ dY(s) 


1 fT n 



+ E, 



1 



\u(s)\ 2 ds 



2 jo 



\u(s)\ ds 



since Y is a 7- Wiener process and l[o jTn ](' s ) ■ \u(s)\ 2 ds < n by construc- 



tion of r n . Since relative entropy is nonnegative and E 7 



l ds] 
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U T Hs)\ 2 ds 



in [0, oo] as n — > oo by monotone convergence, we obtain 



(B.7) 



M 7 ||7o) > E n 



\u(s)\ 2 ds 



< 00, which 



Since -R(tIIto) < 00 by assumption, also E 7 \ J Q T \u(s)\ 2 ds 
together with (jB.6[) actually implies equality in ()B.7p . 

Now we are in a position to choose ((O, J 7 , P), (Ft), u, W) € U such that 



P o ( W + J u(s)ds 



7 and i?(7||7o) > E 



\u(s)\ 2 ds 



Take Q = y, let T be the 7-completion of Bt, and take P equal to 7, 
extended to the additional null sets. Let (Ft) be the 7-augmentation of (Bt). 
Notice that Bt Q Ft, t € [0, T], and that (J~t) satisfies the usual hypotheses. 
Define the control process u according to (jB.5|) . and set W = Y. Then W is 
an (J-t)- Wiener process under P and 



Po [W + 



J u(s)ds^j 



7 o ( Y + J u(s)ds ) = 7 o Y 1 - " 



since Y is the identity on y = Q. Finally, by (IB. 7 



i?( 7 ||7o) > E 



\u(s)\ 2 ds 



where expectation is taken with respect to P = 7. 



□ 



Rema rk B.l. Lemma fB.ll allows to derive a version of Theorem 3.1 in lBoue and Dupuis 
(1998), the representation theorem for Laplace functionals with respect to 
a Wiener process. The starting point here as there is the following ab- 
stract representa t ion fo rmula for Laplace functionals (Proposition 1.4.2 in 
Dupuis and Elfisl . 119971 . p. 27). Let S be a Polish space, v E V(S). Then for 
all / : S — > K bounded and measurable, 

(B.8) -log f e~ f{x) v(dx) = inf j RU\\u) + f f(x)n(dx) \ . 

Js MeP(s) { Js J 

With S = y, v = 70 Wiener measure as above, Equation (IB, 8ft and Lemma TB. II 
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imply that 



log 



y 



inf < inf_ E 

-Y£P(y) [ueU:Lsm(Y u )=^ 

inf inf_ E 

l£V{y) «GW:Law(y")=7 
fT 

|2 



\u(t)\ 2 dt 



+ 



f(yh(dy) 



\u(t)\ 2 dt + f{Y u ) 



inf E 



u(t)\ 2 dt + f(Y u ) 



Let be a standard d-dimensional Wiener process over time [0, T] defined 
on some probability space (Cl,^, P). Since fy e~f^ r yo(dy) = Ep [e - -^ 1 ^], 
it follows that for all / : S — > R bounded and measurable, 



(B.9) 



logE, 



e -/W 


= inf E 


f 1 / 









|«(i)| 2 dt + /(y u ) 



The difference with the formula obtained in Boue and Dupuisl ( 1998 ) lies in 
the fact that the control processes there all live on the canonical space and 
are adapted to the canonical filtration, while here the stochastic bases for 
the control processes may vary. 



References 

P. Billingsley. Convergence of Probability Measures. Wiley series in Proba- 
bility and Statistics. John Wiley & Sons, New York, 1968. 

M. Boue and P. Dupuis. A variational representation for certain functionals 
of Brownian motion. Ann. Probab., 26(4):1641-1659, 1998. 

A. Budhiraja, P. Dupuis, and M. Fischer. Large deviation properties of 
weakly interacting processes via weak convergence methods. Ann. Probab., 
40(1):74-102, 2012. 

P. Dai Pra and F. den Hollander. McKean-Vlasov limit for interacting ran- 
dom processes in random media. J. Stat. Phys., 84(3/4) :735-772, 1996. 

D. A. Dawson and J. Gartner. Large deviations from the McKean-Vlasov 
limit for weakly interacting diffusions. Stochastics, 20(4):247-308, 1987. 

P. Del Moral and A. Guionnet. Large deviations for interacting particle 
systems: Applications to non linear filtering. Stochastic Processes Appl, 
78(l):69-95, 1998. 



28 



P. Del Moral and T. Zajic. A note on the Laplace- Varadhan integral lemma. 
Bernoulli, 9(l):49-65, 2003. 

A. Dembo and O. Zeitouni. Large Deviations Techniques and Applications, 
volume 38 of Applications of Mathematics. Springer, New York, 2nd edi- 
tion, 1998. 

B. Djehiche and I. Kaj. The rate function for some measure-valued jump 
processes. Ann. Probab., 23(3): 1414-1438, 1995. 

P. Dupuis and R. S. Ellis. A Weak Convergence Approach to the Theory of 
Large Deviations. Wiley Series in Probability and Statistics. John Wiley 
& Sons, New York, 1997. 

R. S. Ellis. Entropy, Large Deviations and Statistical Mechanics, volume 271 
of Grundlehren der mathematischen Wissenschaften. Springer, New York, 
1985. 

S. Feng. Large deviations for Markov processes with mean field interac- 
tion and unbounded jumps. Probab. Theory Relat. Fields, 100(2):227-252, 
1994a. 

S. Feng. Large deviations for empirical process of mean field interacting 
particle system with unbounded jumps. Ann. Probab., 22(4):2122-2151, 
1994b. 

H. Follmer. An entropy approach to the time reversal of diffusion processes. 
In M. Metivier and E. Pardoux, editors, Stochastic differential systems: 
Proc. IFIP-WG 7/1 Conf., Marseille-Luminy, 1984, volume 69 of Lecture 
Notes in Control and Inform. Sci., pages 156-163. Springer- Verlag, Berlin, 
1985. 

H. Follmer. Time reversal on Wiener space. In S. A. Albeverio, P. Blanchard, 
and L. Streit, editors, Stochastic Processes - Mathematics and Physics: 
Proc. 1st BiBoS-Symposium, Bielefeld, 1984, volume 1158 of Lecture Notes 
in Math., pages 119-129. Springer- Verlag, Berlin, 1986. 

J. Jacod and A. N. Shiryaev. Limit Theorems for Stochastic Processes, vol- 
ume 288 of Grundlehren der mathematischen Wissenschaften. Springer- 
Verlag, Berlin, 2nd edition, 2003. 

O. Kallenberg. On the existence of universal functional solutions to classical 
SDEs. Ann. Probab., 24(1): 196-205, 1996. 



29 



I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus, 
volume 113 of Graduate Texts in Mathematics. Springer, New York, 2nd 
edition, 1991. 

C. Leonard. Large deviations for long range interacting particle systems with 
jumps. Ann. Inst. Henri Poincare, Probab. Stat., 31(2):289-323, 1995a. 

C. Leonard. On large deviations for particle systems associated with spatially 
homogenous Boltzmann type equations. Probab. Theory Relat. Fields, 101 
(l):l-44, 1995b. 

H. P. McKean. A class of Markov processes associated with nonlinear 
parabolic equations. Proc. Nat. Acad. Sci. U.S.A., 56(6):1907-1911, 1966. 

H. Tanaka. Limit theorems for certain diffusion processes with interaction. 
In K. Ito, editor, Stochastic Analysis: Proc. Taniguchi Symposium, Katata 
& Kyoto, 1982, volume 32 of North-Holland Mathematical Library, pages 
469-488. Elsevier, Amsterdam, 1984. 

A. S. Ustiinel. Entropy, invertibility and variational calculus of adapted 
shifts on Wiener space. J. Fund. Anal, 257(1 1):3655-3689, 2009. 

S. R. S. Varadhan. Asymptotic probabilities and differential equations. 
Comm. Pure Appl. Math., 19:261-286, 1966. 



30 



